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Abstract. We consider both discrete and continuous control problems constrained by a fixed 
budget of some resource, which may be renewed upon entering a preferred subset of the state space. 
In the discrete case, we consider both deterministic and stochastic shortest path problems with full 
budget resets in all preferred nodes. In the continuous case, we derive augmented PDEs of optimal 
control, which are then solved numerically on the extended state space with a full/instantaneous 
budget reset on the preferred subset. We introduce an iterative algorithm for solving these problems 
efficiently. The method's performance is demonstrated on a range of computational examples, 
including the optimal path planning with constraints on prolonged visibility by a static enemy 
observer. 

In addition, we also develop an algorithm that works on the original state space to solve a 
related but simpler problem: finding the subsets of the domain "reachable-within-the-budget" . 
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Dynamic programming provides a convenient framework for finding provably "optimal" strategies 
to control both discrete and continuous systems. The optimality is usually defined with respect to 
fH a single criterion or cost (e.g., money, or fuel, or time needed to implement each particular control). 

The value function is defined as the cost of such optimal control for each starting position a; e il, 

Sand the Dynamic Programming (DP) equations are then solved to recover this value function. 
I I Realistic applications usually involve several different criteria for evaluating control strategies 

and the notion of "optimality" becomes more complicated. One natural approach is to focus on a 
^""^ single "primary" cost to be optimized, while treating all other ("secondary") costs as constraints. 

^ A typical application of this type is to find the fastest path to the target subject to constraints 

on the maximum use of energy along that path. Another possible constraint is on the maximum 
total amount of time that a robot can be visible to an unfriendly observer while moving to a target. 
\^ Kumar and Vladimirsky have recently introduced efficient numerical methods for these and similar 

constrained-optimal control problems in continuous domains [53] . We will refer to such problems as 
y—i "budget-constrained" since the original state space will have to be expanded to keep track of the 

T-H remaining "resource-budget" to satisfy the constraint. (E.g., two different starting energy-budgets 

may well result in very different "constrained time-optimal" paths, even when the starting physical 
J> position is the same.) 

This state space expansion leads to a significantly larger system of augmented DP equations or a 
^ higher-dimensional domain in the continuous case. Thus, the computational efficiency of numerical 

^ methods becomes particularly important. In many applications it is natural to assume that the 

resource budgets are non-renewable and that every change in the system state results in an immediate 
budget decrease. This introduces a natural "causal ordering" on the expanded sate space: the 
knowledge of energy-constrained time-optimal controls for the starting energy-budget bi is very 
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useful in computing such constrained-optimal controls for a higher budget 62 > ^i- The efficiency 
of methods introduced in ^H] hinges on this explicit causality. 

In this paper we focus on a more general situation, where the budget can be instantaneously 
reset to its maximum value by visiting a special part of the state space 5 C and the budget is 
decreasing when moving through the rest of the state space U = V,\S. If the limited resource is 
fuel, S can be thought of as a discrete set of gas stations. On the other hand, if the secondary 
cost is the vehicle's exposure to an unfriendly observer, the S can be interpreted as a "safe" part 
of the domain protected from observation, and the constraint is on the maximum contiguous period 
of time that the system is allowed to spend in an "unsafe" (observable) set U. To the best of our 
knowledge, our formulation of continuous versions of such problems is new and no efficient methods 
for these were previously available. We show that such "budget-resets" result in a much more subtle 
implicit causality in the expanded state space. Nevertheless, under mild technical assumptions, 
non-iterative methods are available for deterministic (section [2]) and even for certain stochastic 
(section [s]) budget-reset problems on graphs. In the continuous case, this problem is described by 
a controlled hybrid system (section |4|, whose value function is a discontinuous viscosity solution of 
two different Hamilton- Jacobi PDEs posed on S and U x B, where B is the set of possible budget 
levels available to satisfy secondary constraints. The characteristics of these PDEs coincide with 
the constrained-optimal trajectories and define the direction of information flow in the continuous 
state space. Unfortunately, the most natural semi-Lagrangian discretization of these PDEs is no 
longer causal, making iterative numerical methods unavoidable (section [5| . Our general approach 
is equally efficient even in the presence of strong inhomogenieties and anisotropics in costs and/or 
dynamics of the system. In section [6j we provide numerical evidence of the method's convergence 
and illustrate the key properties on several optimal control examples, including prolonged- visibility- 
avoidance problems. (The latter problems originated from Ryo Takei's joint work with Richard Tsai 
and Yanina Landa on visibility and surveillance-evasion.) 

We note that for budget-reset problems, finding constrained-optimal controls is significantly 
harder than just identifying the "reachable" subset of fi; i.e., the set of states from which it's 
possible to steer the system to the target without violating the constraints, provided one starts with 
the maximum budget. An efficient algorithm for the latter problem is presented in Appendix [X] We 
conclude by discussing the limitations of our approach and the directions of future work in section 

El 

Section 2. Deterministic SP on graphs with renewable resources. 

The classical problem of finding the shortest path in a directed graph with non-negative arclengths 
is among those most exhaustively studied. Fast iterative ( "lab el- correct ing" ) and fast non-iterative 
("label-setting") methods for it are widely used in a variety of applications. 

Consider a directed graph on a set of nodes X = {xi, . . . ,Xm, xm+i — t}, where the last node t 
is the target. For each node Xi G X, there is a set of immediate neighbors Ni — N{xi) C X\xi and 
for each neighbor Xj G Ni there is a known transition cost Cij — C{xi,Xj) > 0. For convenience, 
we will take Cij — +00 if Xj ^ Ni. We will further assume that our graph is relatively sparse; i.e., 
maxi |iVi| < K ^ AI. li y — {y^ — Xi^y^, ...,1/^ = t) is a path from Xi to the target, its total cost 
is defined as J^{y) = J2k~o^iykiyk+i)- The value function Ui = U{xi) is defined as the cost along 
an optimal path. Clearly — and for all other nodes the Bellman optimality principle yields a 
system of M non-linear coupled equations: 

(1) U,= mm {C,j+U,}; ^eI = {l,...,M}. 

(This definition makes Ui — +00, whenever there is no path from Xi to t, including the cases where 
Ni ~ ^. Throughout this paper, we will use the convention that the minimum over an empty set 
is +00.) Dijkstra's classical method [TH] can be used to solve the system ([ij non-iteratively in 
0(M log Af) operations. A detailed discussion of this and other label-setting and label-correcting 
algorithms can be found in standard references; e.g. [I] [5]. 

In many applications, a natural extension of this problem requires keeping track of several different 
types of cost (e.g., money, fuel, time) associated with each transition. The goal is then to either 
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find all Pareto optimal paths or to treat one criterion/cost as primary (to be minimized) and all 
others as secondary, providing the constraints to restrict the set of allowable paths. (E.g., what is 
the quickest path given the maximum amount of money we can spend along the way?) A number of 
algorithms for such multi-objective dynamic programming are also well-known |25l 1301 [26] . 

One natural approach to bicriterion problems involves finding a simple, single-criterion optimal 
path but in an expanded graph with the node set X = X x B. We begin with a similar technique 
adopted to our "constrained resources" scenario. Our description emphasizes the causal properties 
of the model, also reflected by the numerical methods for the continuous case in section [5] We 
assume the secondary resource-cost for each transition is specified as ctj = c{xi,Xj). For simplicity 
we assume that these secondary costs are conveniently quantized; e.g., aj G Z. We will use i? > to 
denote the maximal allowable budget and B — {0, . . . , B} to denote the set of allowable budget levels. 
In the extended graph, a node x^gX corresponds to a location Xi d X with the resource budget 
b. The use of resources occurs when c^- > 0, and the renewal/recharge of resources corresponds to 
Cij < 0. If we are starting from Xi with a secondary-budget b and decide to move to Xj G Ni, in the 
expanded graph this becomes a transition from x^ to a;^ '^'^ . 

We now define the value function Wj^ = W{x^) as the minimum accumulated primary-cost from 
Xi to t, but minimizing only among the paths along which the budget always remains in B: 

Wj; = min J{y), 
yeY''(Xi) 

where Y^{xi) is the set of "feasible paths" (i.e., those that can be traversed if starting from Xi with 
resource budget b). 

Remark 2.1. Feasible paths should clearly include only those, along which the resource budget 
remains non-negative. But since we are allowing for secondary costs c^- of arbitrary sign, this 
requires a choice between two different "upper budget bound" interpretations: 

1) Any attempt to achieve a budget higher than B makes a path infeasible; i.e., since we are 
starting from Xi with budget b £ B, 

Y''{x,)^i^y={yo = x,,...,y, = t) | < - Ec(yfc,y,+i)^ < B, V s < r| . 

In this case, the optimality principle would yield a system of equations on the expanded graph: 
W''^ min ICu +W^''''']; W b £ B; V z e I; 

with the following "boundary conditions" : 

Wl^Q, ybe B; W\ = +00, \/ b ^ B, ^ x, e X. 

In practice, the latter condition can be omitted if we minimize over = {xj E Ni \ {b — cij) € B} 
instead of Ni. 

2) An interpretation more suitable for our purposes is to assume that any resources in excess of B 
are simply not accumulated (or are immediately lost), but the path remains allowable. If we define 
a left-associative operation a Q jS ~ min(Q; — P,B), then 

Y\x,) = {y={yo = x,,...,y,.=t) \ < (6 c(yo, y^) e . . . G c(y,„i, y J) < B, V s < r} , 

and the optimality principle on the expanded graph can be expressed as follows: 

(2) W^^ min (C„- + ; V 6 G 6; V i G X, 

with = 0, V 6 e S and N\ = \xj £ Ni \ Cij < b}. Unlike the previous interpretation, this 
definition ensures that the value function is non-increasing in b (since the set of feasible paths is 
non-decreasing as we increase the budget). Thus, we will also similarly interpret the "upper budget 
bound" in sections [3] and ID 
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Remark 2.2. It is natural to view the expanded graph as (B + 1) "6-sUces" (i.e., copies of the orig- 
inal graph) stacked vertically, with the transitions between slices corresponding to Qj-'s. (Though, 
since the total costs of feasible paths do not depend on the budget remaining upon reaching the 
target, it is in fact unnecessary to have multiple copies of t in the expanded graph; see Figure 
[2]). We note that the signs of the secondary costs strongly influence the type of causality present in 
the system as well as the efficiency of the corresponding numerical methods. We consider three cases: 

1. Since the primary costs are non- negative, the system ^ can be always solved by Dijkstra's 
method on an expanded graph regardless of the signs of Cij's. The computational cost of this 
approach is 0{M log M), where M — M x (i? + 1) is the number of nodes in the expanded graph 
(not counting the target). We will refer to this most general type of causality as implicit. 

2. However, if c^-'s are strictly positive, Dijkstra's method becomes unnecessary since the budget 
will strictly decrease along every path. In this case — +00 for all Xi G -''^\{*} and each 
depends only on nodes in the lower slices. We will refer to this situation as explicitly causal since 
we can compute the value function in 0{M) operations, in a single sweep from the bottom to the 
top slice. 

3. On the other hand, if the secondary costs are only non-negative, the budget is non-increasing 
along every path and we can use Dijkstra's method on one 6-slice at a time (again, starting from 
6 = and advancing to b = B) instead of using it on the entire expanded graph at once. This 
semi-implicit causality results in the computational cost of 0(M log Af). 

Remark 2.3. The sign of secondary costs also determines whether the value function actually 
depends on the maximum allowable resource level. E.g., suppose the solution Wj^ was already 
computed for all b G B — {0, . . . , B} and it is now necessary to solve the problem again, but for a 
wider range of resource budgets B = {0, . . . , B}, where B > B. Assuming that W solves the latter 
problem, a useful consequence of explicit and semi-implicit causality is the fact that Wj^ = for 
all b € B; thus, the computational costs of this extension are decreased by concentrating on the set 
of budgets {B -I- 1, ... , B} only. Unfortunately, this convenient property does not generally hold for 
implicitly causal problems. (E.g., when refueling is allowed, two cars with different capacity of gas 
tanks might have different optimal driving directions even if they are starting out with the same 
amount of gas.) 

2.1. Safe/Unsafe splitting without budget resets. The framework presented so far is suit- 
able for fairly general resource expenditure/accumulation models. In this paper, we are primarily 
interested in a smaller class of problems, where the state space is split into "safe" and "unsafe" com- 
ponents (i.e., X\{t} = SVJU) with the secondary cost not accrued (i.e., the constrained resource 
not used at all) while traveling through S. A simple example of this is provided in Figure [ij 

In the absence of "budget resets" this safe/unsafe splitting is modeled by simply setting dj = 
whenever Xi G S, yielding semi-implicit causality; see Figures [2] and [3] To make this example intu- 
itive, we have selected the primary costs Cij to be such that the "primary-optimal" path (described 
by U] see Equation ([T])) always proceeds from Xi to Xi+i, etc. However, the budget limitations 
can make such primary-optimal paths infeasible. E.g., when _B = 3 and no resets are allowed, the 
best feasible path from x^^ is (0:4, £C5, ajg, i), resulting in Wf = 6 > 5 = U^] see Figure [3| Still, all 
constrained-optimal paths travel either on the same b slice or downward, resulting in a semi-implicit 
causality. 

Since for large B the expanded graph is significantly bigger than the original one, it is useful to 
consider "fast" techniques for pruning X. We note that the Wf = 00 for all Xi e U and so the 
"0-budget" copies ofU nodes can be always safely removed from the extended graph. A more careful 
argument can be used to reduce the computational domain much further: 

Remark 2.4. To begin, we describe the resource-optimal paths by defining another value function 
Vi — V{xi) to be the minimum resource budget b needed to reach the target from Xi. For the 
described model, Cij > and it is easy to show that this new "resource value function" would have 
to satisfy 

(3) V,^ min {c,j+V,}; z G I = {1, . . . , M}, 
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Figure 1. Green nodes are in S and Red ones are in U. The primary costs Cij are 
specified for each hnk, while the secondary costs are Cij = 1 if Xi d U and Cij = 
if Xi e S. The arrow types (sohd, dashed, or dotted) also correspond to different 
values of primary transition costs. To build a concrete illustration, we will assume 
that i? = 3 is the maximum number of Red nodes we can go through. 
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Figure 2. Budget-constrained shortest path (no "resets"). For every node except 
for the target, the superscript denotes the remaining red budget at that node. 
Transition on the same level from a green node, down to the next level if going from 
a red one. The primary transition costs are indicated by the type of the arrow; see 
Figure [T] 



We will further define Vi as the minimum resource budget sufhcient to follow the primary-optimal 
path from Xi. The equation for Vi is similar to ([3|, but with the minimum taken over the set of 
optimal transitions in ([T]) (instead of minimizing over the entire Ni). Analogously, we can define Ui 
to be the primary cost along the resource-optimal trajectories. For the example presented in Figure 
[l] the values of all four of these functions can be found in the left half of Table |2.1| 
We note that 

1) Wi = Ui for b = Vi. We will refer to the set of such x^ nodes as the Minimum Feasible Level 
since Wf — oo for any P <Vi. 

2) = Ui for any b > Vi, since the primary-optimal trajectories are feasible there. 

These two observations can be used to strongly reduce the extended graph on which we solve the 
system E.g., in Figure [sj this pruning excludes all nodes except for {x'l,X2,xl}. In examples 
where Vi ^ B <ti Vi for most nodes, this remaining subset of X will be generally much larger, but 
the pruning is still worthwhile since J7, V, V, and U can be computed on a much smaller (original) 



graph. A similar procedure for the continuous case is described in section 5.4 



2.2. Safe/Unsafe splitting with budget resets. Our main focus, however, is on problems where 
the budget is also fully reset upon entering S. To model that, we can formally set Cij = —B < 0, 
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..3 0==0^Q-„0 Q-,,Q 



6=0 (^^^:i;zzi(^^ — ^CH^ — ^CH^ 




Figure 3. Semi-implicit causality: run Dijkstra's algorithm slice- by-slice on the 
expanded graph (from bottom to the top). The number inside each node represents 
the minimum cost from x'^ to the target. (If there are no "green cycles" , the causality 
becomes explicit and there is no need for Dijkstra's.) 
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Table 1. Various "optimal" costs for the example in Figure[T]with B ^ 3. Primary- 
optimal cost ([/), resource-optimal cost {V), resource cost along primary-optimal 
path {V), and primary cost along resource-optimal path (U) for the no- reset and 
budget-reset problems. Note that for the latter problem we can also define V{x3) = 
00 since B — 3. 



ensuring the transition from a;^ to x^ , whenever Xi £ S. Since we always have a maximum bud- 
get in S, there is no need to maintain a copy of it in each &-slice. This results in a smaller ex- 
panded graph with {\S\ + B\U\ + 1) nodes; see Figure |4j Unfortunately, the negative secondary 
costs make this problem implicitly causal, impacting the efficiency. Moreover, unlike the no- 
resets case, the projections of constrained-optimal trajectories onto the original graph may contain 
loops. E.g., starting from X4 with the resource budget b = 2, the constrained-optimal trajectory 
is (3:4, 0:3, X2,Xz, Xq, Xj, ajg, t) and the first two transitions are needed to restore the budget to the 
level sufficient for the rest of the path. Note that, if we were to re-solve the same problem with 
B = 4, the constrained-optimal path from the same starting location and budget would be quite 
different: {x^, X3, X2, x^, X4, x^,Xq, Xj, Xg, t), resulting in a smaller W4 = 9; see Remark 2.3 



Unfortunately, the domain restriction techniques outlined in Remark 2.4 are no longer directly 
applicable. For the current example, the values of functions U, V, V, and U are provided in the 



right half of Table 2.1 but their interpretations (and algorithms needed to compute them) are more 
subtle. E.g., the function Vi is naturally interpreted as the minimum starting budget sufficient to 
reach t starting from Xi. Thus, Vi is not defined on "safe" nodes (since on S the budget is always 
equal to B), and for Xi E U the value of Vi is i?-dependent. If Vi ~ b, then Y''{xi) 7^ and 
Y^{xi) = for all b < b. If y = (i/q = tCi, . . . , = t) e Y^{xi) and s is the smallest index such 
that G 5, then b = X]fc=o "^(yfe' ^fc+i)- ^ result, we could also interpret Vi as the minimum 
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Figure 4. The problem with budget resets for B = 3. (The 6 = shoe is omit- 
ted simply because = 00 for any Xi e U.) Implicit/monotone causality: run 
Dijkstra's algorithm on the entire expanded graph. The number inside each 
node represents the minimum cost from x^ to the target. We show in bold font the 
values different from the "no resets" case. 



"resource cost" of traveling from Xi through U to reach S = {xj E S \ < 00}. Unfortunately, 
this definition does not allow for efficient computations on the original graph, since S is not known 
a priori. 

If the values W^^ were already known for all the "Green" nodes Xj e S, we could efficiently 
compute Vi for all Xi G U (thus restricting the computational domain); moreover, all could be 
then computed in a single upward sweep (from the slice 6 = 1 to the slice b = B). This observation 
serves as a basis for the iterative method summarized in Algorithm [T] Intermediate stages of this 
algorithm applied to the above example are depicted in Figure [5] 

Remark 2.5. Since the primary costs C^-'s are positive, the value function can be still computed 
non-iteratively - by using Dijkstra's method on the full expanded graph without attempting any 
domain restriction techniques. It is possible to find examples, on which the advantages of restrict- 
ing the domain outweigh the non-iterative nature of Dijkstra's method. But our main reason for 
describing the iterative Algorithm [T] is its applicability to the problems of sections |3] and |4j where 
Dijkstra-like methods are generally inapplicable. 



Section 3. Stochastic SP on graphs with renewable resources. 

Stochastic Shortest Path (SSP) problems form an important subclass of Markov Decision Pro- 
cesses (MDPs) with a range of applications including manufacturing, resource allocation, and dy- 
namic routing 1^. In addition, it is well known that SSPs can be used to obtain semi-Lagrangian 
discretizations of continuous optimal control problems [291 HI] ■ The latter approach has been also 
used in and the connections will be highlighted in section [Sj 

We first briefly describe the usual (single-criterion) SSPs, closely following the exposition in 
standard references; e.g., [5]. We consider a controlled stochastic process on a previously described 
graph with nodes X. A compact set of controlled values Ai = A{xi) is assumed known for every 
Xi G X. A choice of the control a E Ai determines our next transition penalty Ci{a) = C{xi, a) and 
the probability distribution over the set of possible successor nodes; i.e., Pij{a) = p{xi,Xj,a) is the 
probability of transition to Xj if the current node is Xi and we choose the control a. Both p and C are 
usually assumed to be known and continuous in a. We will further assume that C{xi,a) > A > 0. 
The target t is assumed to be absorbing; i.e., Pft{a) = 1 and Cf{a) — for all a E Af. The goal 
is to minimize the expected total cost from each Xi to t. More formally, a stationary policy can be 
defined as a mapping ^ : X ^ \J- Ai satisfying fi{xi) E Ai for all Xi E X. The expected cost of this 
policy starting from Xi is 



J{x„p.) = E 



+ 00 

.k=0 
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1 Initialization: 



2 Wt 

3 W, 



V Xj e S; 



0; 

= oo, 

4 Wl := oo, y x,eU,be B. 

5 Compute Ui for all Xi E X. 



B 

j ■ 

lb ._ 



6 Main Loop: 

7 repeat 

8 Phase I: 

9 Using the current Wf values to specify S, compute Vi and Ui for each Xi E U. 



10 

11 

12 

13 

14 

15 
16 

17 

18 

19 

20 
21 
22 

23 
24 



foreach h ~ 1 , . . . , i? do 
foreach Xi £ U do 

\ih>Vi then 

\i h = Vi then 

:= U,:, 



else 

if ly^-i = U, then 



else 



Compute Wi from equation [2j 



end 



end 



end 



end 



end 



25 
26 



Phase II: 



Run Dijkstra's method on S only to (re)-compute W^s for all Xj € S 



27 until all 's stop changing] 

Algorithm 1: Iterative algorithm for the DSP problem with budget resets. 



where y = (j/q ^itVi^ ■ ■ .) is a random path resulting from applying /z. Under the above assump- 
tions, it is easy to show that there exits an optimal stationary policy ji* , whose expected cost is used 
to define the value function: 

Ui = imn J {Xi.ji) = J{xi,fi*). 
Moreover, the optimality principle yields a non-linear coupled system of equations 



(4) 



M+l 

Ui = min I C{x.i.a) + ^ pij{a)Uj 
aeAi I 



V i el; 



with Uf — 0. The vector U — {Ui, . . . , Um) can be viewed as a fixed point of an operator T : 
— >■ R^^ defined componentwise by the right hand side of equation Q. This operator is not a 
contraction, but under the above assumptions one can show that T will have a unique fixed point, 
which can be recovered as a limit of value iterations C/"+^ = T{U'"') regardless of the initial guess 
[/° e i?*^; see [TU]. However, for a general SSP, an infinite number of iterations may be needed for 
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Initialize: = = = oo. 

Fix "Green" and solve on "Red": 

Xi X2 X4 X^ Xq X'! ajg 




Fix "Red" and solve on "Green": = 2. 

Fix "Green" and solve on "Red": 

Xi X2 x^ X4 x^ Xq X7 ajg 




Fix "Red" and solve on "Green": = 9; Wi = 8. 

Fix "Green" and solve on "Red": 

Xi X2 X^ X4 Xr, Xq X7 Xg 




Figure 5. The problem with budget resets for B = 3 solved iteratively. Note that 
the MFL for x^ and X4 is not correctly determined until the last iteration. 



convergence while non-iterative (label-setting type) methods generally do not produce the correct 
solution to the system Q. An important practical question is to determine a causal subclass of 
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SSPs; i.e., the SSPs on which faster label-setting algorithms can be used legitimately. The first step 
in this direction was made by Bertsekas |9|, who showed that Dijkstra-like methods apply provided 
there exists a consistently improving optimal policy; i.e. an optimal stationary policy fi* such that 
Pij{li*{xi)) > Ui > Uj. Unfortunately, this condition is implicit and generally cannot be 

verified until the value function is already computed. Explicit conditions on the cost function C 
that guarantee the existence of such a /i* have been derived in [15] for a subclass of Multimode SSP 
problems (MSSPs). The latter include some of the SSPs resulting from discretizations of Eikonal 
PDEs, a perspective first studied by Tsitsiklis in [?T] and later extended by Vladimirsky in [32] ■ 

While the methods for multi-objective control of discounted infinite-horizon MDPs have been 
developed at least since 1980s ;?3], to the best of our knowledge, no such methods have been 
introduced for mult i- objective SSPs so far. Similarly to the deterministic case, we will assume 
that the secondary cost Ci{a) = c{xi,a) is specified for all controls, that the secondary costs are 
conveniently quantized (e.g., integers), and that the goal is to define the value function by minimizing 
only over those policies, whose cumulative secondary costs always stay between and some positive 
integer B. More formally, we consider stationary policies on the extended graph (i.e., /i : X x Z — >■ 
Uj Ai) and define the restricted set of allowable policies 

M{x,,b)^{fi\P{0< bQc{yo,fi{yo,bo))e...ec{y,,fi{y,,bs)) < B) - 1, V s > 0} , 

where y = [y^ = Xi,yi, . . .) is a random path and {bo = 6, 6i, . . .) is the corresponding sequence 
of secondary budgets resulting from applying /i starting at {xi,b). The value function can be then 
defined as Wj^ = min^^M{Xi,b) J{xi, b, /i) and the optimality principle yields the system of equations 

{A/+1 1 
Cix,,a) + y (a)iy''®"-(°) } ; be[0,B]; V ^ G I, 
U J 

where = {a G A, | Ci{a) < b}, and W^" = 0, 

This system can be always solved by the value iterations on the expanded graph regardless of 
signs of Ci(a)'s. But for general SSPs, this iterative algorithm does not have to converge in a finite 
number of steps, if the problem is not causal. For the subclass of MSSP problems, the criteria 
developed in j42| can be used on the primary running cost function Ci{a) to determine if a more 
efficient Dijkstra-like method is applicable; in the latter case the system is implicitly causal. 

If we assume that all Ci(a) > c > 0, this obviously rules out policies which do not guarantee 
terminating in B/c steps. Similarly to the deterministic case, this assumption yields an explicit 
causality, allowing to solve the system ([5| in a single sweep (from b — to b = B). 

Finally, if Ci(a)'s are known to be non-negative, W^^ may depend on Wj'^ only if bi > 62- 
Assuming the MSSP structure and the absolute causality of each Ci{a), the causality of the full 
problem is semi-implicit and a Dijkstra-like method can be used on each individual 6-slice of the 
extended graph separately. 

Returning to modeling the use of resources in an "unsafe" set only, we can achieve this by setting 
Ci{a) — 0, whenever Xi £ S and Ci{a) > for Xi G U, yielding semi-explicit causality. On the other 
hand, to model the budget resets in "safe" nodes, we can instead define Ci{a) — —B < 0, for all 
Xi G S. As in the previous section, the resets make it unnecessary to store more than one copy of 
safe nodes (since b ^ B on S). The resets make it harder to perform domain restriction techniques 



similar to those described in Remark 2.4 This difficulty can be circumvented by a minor modification 
of Algorithm [T] In section [5] we explain that the SSP interpretation of the PDE discretization is 
actually causal on the S (and therefore a Dijkstra-like method is applicable to recomputing the W.f 



values for Xj G S); see also Remark 5.4 For the general budget-reset SSPs, Algorithm [T] would have 



to be changed to use the value iterations on S instead. 

Section 4. Continuous problems with budget reset. 

4.1. Continuous optimal control formulation. We begin with a brief review of the basic exit 
time optimal control problem in the absence of resource constraints. Given an equation for controlled 
motion in ft, the objective is to characterize optimal paths (and their associated optimal costs) from 
every point in a domain ft to the boundary dU. The static Hamilton- Jacobi-Bellman (HJB) partial 
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differential equation (PDE) discussed in this subsection is fairly standard and we focus only on 
the details important for the subsequent discussion. For a comprehensive treatment of more general 
resource-unconstrained problems, we refer interested readers to the monograph [3] and the references 
within. 

4.1.1. The Hamilton- J acohi- Bellman equation. Consider an open bounded set ^l C ii". Let A e i?" 
be a compact set of control values and A the set of admissible controls, 

(6) A — {measurable functions a : R ^ A}. 

Let y : [0, oo) — ^ fl represent the time-evolution of a particle state governed by the dynamical system 

'^{t)^y{t) = f{y{t),a{t)), 
y(0) = X, 

where /: x A — > i?" represents the velocity. We refer to all y(-) that satisfy Q with admissible 
controls A as admissible paths. For notational simplicity, we have suppressed explicitly writing the 
dependence of y(-) on the control a ^ A and the initial state a; e 17. 

Define K : Vt ^ R &s the running cost and q: dVl — )■ i? to be the terminal cost when the state 
reaches a point on the boundarj]^ Thus, the total cost associated with an initial state x & VL and 
an admissible control a(-) e A is 

(8) J{x,a{-))^ I K{y{s),a{s))ds + q{y{T)), 



where T = min{t | y(t) £ 9f2}. Then the value Junction u: — > i? is the minimal total cost to the 
boundary d^l starting at x: 

(9) u{x)= mi J{x,a(-)). 

a(-)eA 

By Bellman's optimality principle [7 , for every sufficiently small r > 0, 



(10) u{x)= inf / K{y{s),a{s))ds + u{y{T)) 

A Taylor series expansion about x yields the Hamilton- Jacobi- Bellman equation: 

(11) min{K{x,a) +Wu{x) ■ f{x,a)} ^0. 

compact set A. Note that, since the minimization is taken over a compact set A, the infimum 
becomes a minimum. From the definition, the boundary condition on the value function is 

(12) uix)=q{x), xedfl. 

4.1.2. Viscosity solutions. For the functions f,K,q introduced above, we make the following as- 
sumptions: 

(Al) / and K are Lipschitz continuous. 

(A2) There exist constants Ki,K2: such that < Ki < K{x, a) < K2 for all a; G and a £ A. 
(A3) The velocity f : ft x Ai-^ i?" is bounded (i.e., ||/|| < F2) and the motion in every direction 

is always possible, i.e., 

Va; e Q, and all unit vectors v £ il", 3a £ A s.t. v- f{x,a) = ||/(a3,a)|| > Fi > 0. 

Moreover, we assume that the scaled vectogram {f{x, a)/K{x, a) \ a £ A} is strictly convex 

at each x £ fl. 
(A4) q is lower semi-continuous and mingo q < 00. 

^In many applications, the objective is to optimally steer towards a prescribed closed target set T C This fits 
into our formulation by setting the domain to be f!\7~ and the terminal cost to be g = 00 on 9f! and q finite on dT. 
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The assumption |(A3)| yields the small-time controllability property P] which guarantees the conti- 
nuity of the value function u on Q. In general, even under the aforementioned assumptions, classical 
solutions to the PDE ( |Tl| ) usually do not exist, while weak (Lipschitz-continuous) solutions are not 
unique. Additional selection criteria were introduced by Crandall and Lions [16 to recover the 
unique weak solution (the so called viscosity solution), coinciding with the value function of the 
control problem. 

4.1.3. Examples. In the special case of geometric dynamics with unit running cost, 
(13) A = {a; I ||a;|| = 1}, f{x, a) = af(x, a) and K{x, a) = 1, 



the PDE (11) reduces to 



(14) ^max {f{x,a) (-a •Vu(a;))} = 1. 

This models a particle which travels at speed f{x, a) at the point x in the direction a E A. For 
zero terminal cost (7 = 0, the value u{x) equals the minimum travel time of such a particle from x 
to dfl. If we further assume isotropic speed f{x,a) = f{x), (14) simplifies to the eikonal equation: 

(15) f{x)\\Vu{x)\\ = l. 



Finally, for a unit speed f — 1 with q — 0, the viscosity solution to (15 1 is the distance function to 

on. 



4.1.4. Optimal trajectories. Under the assumptions (Al)|[(A4)' the value function u can be used to 



extract the optimal feedback control a* — a*{x), provided u is smooth at x £ fl. For the eikonal 



case (15), for instance, it can be shown that a*{x) — — Vu(a;)/|Vw(a;)| (note that Vw under 
the assumptions |(A2)[ ) ; the points where Vit does not exist are precisely where no unique controls 
exist. Subsequently the optimal trajectory y*{-) from x € can be computed as solution to the 
initial value problem ([t]) with a{t) — a*{y{t)). Moreover, it is known that the optimal trajectory 
from a point x coincides with the characteristic curve to x from the boundary dfl, but traveling 
in the opposite direction. Furthermore, a path starting at a point where a unique optimal control 
exists will have a unique optimal control for all i > until it reaches dQ. 

4.2. Augmented PDE for the budget constrained problem. The problem of budget con- 
strained optimal paths was proposed in |28j , where general multi-criterion optimal control problems 
were solved an augmented PDE on an augmented state space. For the purpose of this article, we 
shall briefly describe this method for the single budget constraint. 

4.2.1. The augmented PDE for the single budget constrained problem. Define the extended space 
= £7 X (0, i?], where i? > is a prescribed maximum resource budget. We shall call a point in 
(a;, b) G an extended state. Here, b represents the current resource budget. We represent a path 
parametrized by time i > in the extended domain as z{t) = {y{t), /3{t)) € ilg- 

Next, define the secondary cost K : ft x A R, strictly positive, which is the decrease rate of 
the budget. We shall assume that K is Lipschitz continuous and there exist constants Ki,K2 such 
that 

(16) < Ki < K{x,a) < K2, yxen,aeA. 

Then the equations of motion in fig are 

m = fiy{t)-a{t)), yiO)^x, 
^^^^ m = -K{y{t),a{t)), m^b. 

For an arbitrary control a £ A, define the terminal time starting at the extended state (x, b) as 

f{x, b, a) = min{i | yit) e dVL, i3{t) > 0}. 
Define the set of all feasible controls for paths starting at the extended state {x, b) as 

A{x, b) = {aeA \ f{x, b, a) < 00}. 
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We shall call paths corresponding to feasible controls as feasible paths. Then the value function is 

w{x,b) = inf J'{x,a{-)). 
a{-)eA{x,b) 



The boundary condition is 

(18) w{x,b) 



q{x) X edn,be[0,B] 
oo X £ fl, b — 0. 



Note that 'w{x,b) > u{x), since in the unconstrained case the set of controls is larger. Moreover, if 
the starting budget b is insufficient to reach the boundary, the set A{x^ b) is empty and w{x,b) — oo. 
Wherever the value function is finite on f2e, Bellman's optimality principle similarly yields the HJB 
equation 

(19) mm IVxw ■ f{x,a)- k{x,a)^ +K{x,a)\ ^0. 

Here Va; denotes the vector- valued operator of partial derivatives with respect to x. For the case of 
isotropic velocity and unit running cost K — 1, and K{x, a) = K{x), the latter equation reduces to 

(20) /(a;)|Va;H+^(^)^ = l- 



Remark 4.1. The (augmented) velocity function {f{x, a), —K{x, a)) in (17 1 no longer satisfies the 
assumption |(A3)"] This implies the lack of small time controllability in ile and possibly discontinuities 
in the corresponding value function w. While the original definition of viscosity solutions required 
continuity, a less restrictive notion of discontinuous viscosity solutions applies in this case; see ^ 
Chapter 5], |38j, |3T] and the references therein. 



The assumption (161 on K implies that the characteristics move in strictly monotonically increas- 



ing 6-direction (see also Property 4.6 1. Thus, the value function w is explicit causal in b: the value 



function at a fixed point x' G fi, 6' e (0,5] depends only on the value function in {{x,b) \ b < b'}. 



Moreover, since > 0, by rearranging the PDE (20) 

dw 



{l-fix)\Vxw\ 



db K{x) 

the problem can be viewed as a "time-dependent" Hamilton- Jacobi equation, where b represents 



the "time." In [25], this explicit causality property of (19) yielded an efficient numerical scheme 
involving a single "sweep" in the increasing 6-direction (see section |5.3[). We note the parallelism 
with the discrete problem formulation, described in case 2 of Remark |2.2[ 

Remark 4.2. Before moving on to the budget reset problem, we briefly discuss a slight generaliza- 



tion to the budget constrained problem in |28j . Suppose we relax the lower bound in (16) to allow 
= on some closed subset 5 C 11. The set S can be interpreted as a "safe" set, on which the 
resources are not used. (In this setting, the problem previously considered in [28j corresponds to 
S = dfl.) Clearly, explicit causality no longer holds when 5 H 7^ 0; rather, w has a semi-implicit 
causality property: the value function at a fixed point a;' G il, 6 G (0, i?] depends only on the value 
less than or equal to b' in the 6-space. The interdependence between values on the same b level 
makes it impossible to transform the PDE into a time-dependent Hamilton- Jacobi equation (as with 
the explicit causality case). Instead, the value function satisfies a separate eikonal equation in each 
6-slice: 

(21) minWxwix.b)- f{x.a) + K(x,a)} = 0, b e [0, B],x e mUS). 

aeA 

We again note the parallelism with the discrete formulation, described in case 3 of Remark |2.2| see 
also Figures [2] and [3] 
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4.3. A PDE formulation for the budget reset problem. We now consider a continuous ana- 
logue of the problem in section 2.2 and Figure |4j Partition the closure of the domain Ct = U U S, 



where U is open (thus, dfl C S). Here, U represents the "unsafe" set, where the budget decreases 
at some Lipschitz continuous rate K, 

(22) < ki < k{x,a) < k2, yxeU,aeA. 

and S represents the "safe" set, where the budget resets to its maximum. We denote the interface 
between the two sets in the interior of the domain as F = dU n fi. We will further assume that the 
set S\{x € dn I q{x) < oo} consists of finitely many connected components. 



To model the budget reset property, the budget (3{t) is set to B in S. Thus, equations (17 1 are 
modified accordingly in S, and the resulting dynamics is described by a hybrid system: 

(24) h if.(.).5. 



with initial states: 

(25) m 



B ify(0)e5. 

Similarly to section [42] we define the terminal time for a control a G A a,s 
f{x, 6, a) ^ mm{t \ y{t) e dVL, /3(t) > for all r e [0, t]} 



where the paths y(-) satisfy the equations (23)-(p5|. Let A(x,h) = {a E A \ T(x,b,a) < oo} to be 



the set of all feasible controls for the budget reset problem. Then the value function is defined as 

w{x,b) = inf J{x,a{-)). 
a{-)eA{xM) 

Remark 4.3. A simple, illustrative example is a budget reset problem, in which the objective is to 
reach a target in minimum time (K = 1, q = 0) while avoiding a prolonged continuous tour in U of 
more than time B > (hence, k — 1). 

4.3.1. Augmented PDE in the reduced extended domain. For the budget reset problem, a path z(-) = 
(y(-),/3(-)) in the extended domain never visits points {{x,b) \ x E S,b < B}. Thus, we consider 
the reduced extended domain 

rir = fiw U C rie, 

where, 

riu^Ux (0, B] 
Qs^Sx {B}. 

This reduction of the extended state space is analogous to the discrete example described in Figure|4] 
The corresponding HJB equation for this hybrid control problem in Qr can be described separately 
in flu and fig. To distinguish between the value functions in these two domains we write: 



w{x, b) 



wi{x,b) {x,b) e flu 

W2{x) X IE S. 

As in the no-resets case, the value function w is usually infinite on parts of fir (if the starting budget 



is insufhcient to reach the boundary even with resets); see also section 5.4 Bellman's optimality 



principle can be used to show that, wherever w is finite, it should satisfy the following PDEs: 

(26) mml^'7wi{x,b)- f{x,a)-k{x,a)'^{x,b)+K{x,a)^ =0, {x,b)eflu, 

(27) m:m{Vw2(,x) ■ f{x,a) + K{x,a)} ^0, a; e int(5). 
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The boundary conditions on w are: 

xedncS 



(28) w{x,b) 




xen,b^O. 



Remark 4.4. We note the similarity between the previously defined PDE (21 ) and the last equation 
( 27 1 . But while the latter is solved only on one 6-slice (for b = B), the former is actually a system of 
static PDEs (one for each b e [0, B]) with no direct coupling. This difference is a direct consequence 
of the budget reset property. 



In addition to the boundary condition (281, the following compatibility condition (motivated by 
Proposition 4.9) holds between wi and W2, wherever w is finite on F : 

(29) lim wi{x',b) ^ ■W2{x), for all a; e F, e (0, B]. 

Xitu 

We now list several properties of the value function w. The proofs of the first two of them are 
omitted for brevity. 

Property 4.5. In flu, all trajectories are monotonically decreasing in the b direction, when travers- 
ing forward in time. 

Property 4.6. If bi < b2 < B then wi{x,bi) > Wi{x,b2) for all x eU. 

From here on we will use S-jz — {x £ S \ W2{x) < oo} to denote the "reachable" part of the safe 
set. We note that every connected component of 5\ {cc G dil \ q{x) < oo} is either fully in Sn or 
does not intersect it at all. As a result, the number of connected components in the latter set is also 
finite. 

Lemma 4.7. W2 is locally Lipschitz continuous on STz\dCl. 

Proof. We will use the notation B^{x) = {x' e i?" | \\x - x'\\ < e}. 

Choose a point x e STi\dft, and an e > sufficiently small so that B^(x) C il and 

(30) ^ < f. 

Ii K2 

Take any x' S B^{x)C\{S-]i\d^). Then, a straight line path from x to x' will take at most ||a; — 
time to traverse. Furthermore, by (30), such a path must be feasible. Thus, Bellman's optimality 
principle gives W2{x') < K2/Fi\\x — x'\\ + W2{x). By swapping the roles of x and x' we have 
1^2(3;') — W2{x)\ < K2/Fi\\x — x'W, as desired. □ 

A consequence of Lemma |4.7| is that W2 is bounded on each connected component of S-r,, excluding 

dfl. Since S-n consists of finitely many connected components, w™'*'^ = sup W2{x) is also finite. 

xeSTi\dn 

Lemma 4.8. For a point x CzU .such that Wi{x,B) < 00, let y* be an optimal feasible path from x 
to dfl. Let T{x) be the first arrival time of y* to dQ. Then, 

(31) T{x)< ^ ' 



max 



Proof. Let t* > be the first instance in time such that y*{t*) E S. We subdivide y* into two 
segments, the first from x to y*{t*) and the second from y*{t*) to the final point y*{T{x)) G dfl. 
The time taken to traverse the first segment is bounded by B/Ki to satisfy the budget feasibility 
constraint, and the second segment is bounded by wf^^/Ki. □ 



Proposition 4.9. The compatibility condition (29), holds on dS-n\d^ for b e (0,-B). 

Proof. Fix a point x E dSTz\dil. Choose an ei > sufficiently small so that Bf-^{x) C and 
£1 < bFi/K2. If we choose x' E U such that ||a;' — a;|| < ei, by arguments similar to the proof of 
Lemma |4.7[ the straight line path from x' to x is feasible. Thus, by Bellman's optimality principle, 

(32) wi{x',b)<W2{x) + eiK2/Fi, for 6 € (0,5]. 
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Suppose now that b e (0,B). Choose £2 > so that Bf,(x) C and 62 < Assume 
x' £ U and — a;'|| < £2, and consider the straight hne path from a; to a;'. Suppose the resource 
cost and time required to traverse this path is b' and r, respectively. Since t < 62/ Fi, we have by 
the choice of £2, b' < TK2 < {f-2/Fi) K2 < B — b, or equivalently, & < B — b' . 
Thus, by Bellman's optimality principle, 



(33) 



W2{x) <wi{x',B-b')+e2K2/Fi 

<wi{x',b) + e2K2/Fi for6e(0,B), 



where the second inequality follows from Proposition 4.6 Therefore, ( |32[ ) and (33 1 imply that if 
\\x' — x\\ < £ = min{£i, £2} then \wi{x' , b) — W2{x)\ < eK2/Fi for x' EU,b £ (0, B); this proves the 
compatibility condition (29) for 6 G (0,5). □ 

Proposition 4.10. Assume the running costs and the dynamics are isotropic; i.e., K{x, a) — K{x), 



K{x,a) — K{x), and f{x,a 
dSnXdn forb^B. 



afix)A\a\\ . 

Proof. Fix X £ dSTi\dft. We prove that W2{x) 



1 . Then the compatibility condition ( 29 ) holds on 



lim wi(x' B) < 0. Note that (29) follows, since 
x'^x ' — ' 

the proof of (32) also covered the current case (6 = B). 

Take a sequence Xj £ U converging to x, such that wi[xj,B) is finite for each j. Suppose for 
each j that a* £ A{xj,B) is an optimal feasible control and y* is the corresponding feasible path 
such that Tj = mm{t \ y*At) £ dVl}. Let Tmax = sup {rj} and for each j, set y*At) = y*ATj) 



for t £ [Tj,Tmax]- Note that by Lemma 4.8 Tmax is finite. Also, condition (A3) yields uniform 



boundedness and equi-continuity of the paths y*. Therefore, the Arzela-Ascoli theorem ensures that 
(upon reindexing in j) a subsequence y* converges uniformly to a path y* in Q corresponding to 
some admissible control a* £ A. 

Next we show that a* £ A{x',B), i.e. y* is a feasible path. Define T* (and T*) to be the first 
instance when y* (and, respectively, y*) reaches dS-ji. Let T be the infimum limit of the sequence 
{T*} and consider its subsequence such that, upon reindexing in j, T = lim T* . Since dS-jz is closed, 

y* (f) = lim y*{T*) £ dSn, and thus f > T* . This implies that hm J^?} k{y*As))ds > 0. Using 



this observation and the Lipschitz continuity of K, we have. 



K{y*{s))ds 



k{y*As))ds 



< 



< L 



k(y*{s))-k{y*{s))ds 
k{y*{s))-k{y*{s)) 

y*{s) - y*,{s)\\ ds - 



T' 



k{y*As))ds 



ds 



k{y*As))ds 



T* 



K{y*As))ds, 



T' 



where L is the Lipschitz constant for K{-). Since the last expression has a non-positive limit as 
j 00 and each y* is feasible, this shows that on the interval [0, T*] the trajectory y* is feasible as 
well: 

K{y*{s))ds < lim / K{y*As))ds < B. 
J^oo Jo 

If y*{t) does not remain in S-ji after t ^ T* , a similar argument can be used to prove the feasibility 
of all other "unsafe segments" of the trajectory. (Alternatively, appending the optimal trajectory 
corresponding to y*{T*) £ Stz, yields another feasible trajectory, which is at least as cheap with 
regard to the primary running cost K.) 

Applying the same argument to the total running cost ([S]) and recalling that q is lower semi- 
continuous, we obtain J^{x,a*) < lim wi{xj, B). This completes the proof since, by definition of 



the value function, ^2(3;) < J{x,a* 



□ 
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Section 5. Numerical methods for continuous budget reset problems 

Throughout this section, we assume the following setup: let Q he a set of gridpoints on the 
domain 0. While a similar formulation can be constructed for arbitrary unstructured meshes, we 
restrict our description to a uniform Cartesian grid G with grid spacing h. We denote Gu = G nU 
and Gs = G <^ S. To simplify the treatment of boundary conditions, we assume that d^l and F 
are well-discretized by the gridpoints in dG C G and in dGs respectively. We also assume that 
the set of allowable budgets [0,i?], is discretized into equispaced intervals partitioned by gridpoints 
B = {bj = jAb I j = 0, 1, . . . , Nb}, where A6 > is fixed ahead of time. 

5.1. Discretization of the unconstrained case. To begin, we briefly discuss the usual semi- 



Lagrangian discretization techniques for static HJB equations of the form (11). Denote U{x) to be 
the numerical approximation to u(x) at the gridpoint x € G- Suppose the current system state is 
a; G ^ U and the constant control value a is to be used for a short time r > 0. Assuming that 
K and / are locally constant, the new position is approximated by Xair) = x + Tf{x, a) and the 
approximate accumulated transition cost is K{x, ajr. For small r, this yields a first-order accurate 
semi-Lagrangian discretization 

U{x) = iam{TK{x, a) + [/(a;a(r))}, for aU x e G\dG 

(34) 

U{x) — q{x), for all x e dG, 



of Bellman's optimality principle (10). Since Xair) is usually not a gridpoint, U{xa{T)) needs to 
be interpolated using adjacent grid values. Many different variants of the above scheme result from 
different choices of r. Falcone and co-authors have extensively studied the discretized system^ 
which use the same r > for all x and a; see [311201 HI] and higher-order accurate versions in |22| . 
Alternatively, r can be chosen for each a to ensure that Xair) lies on some pre-specified set near 
X. For example, in a version considered by Gonzales and Rofman [24], the motion continues in the 
chosen direction until reaching the boundary of an adjacent simplex. E.g., on a Cartesian grid in R^, 
if Xs and Xyj are two gridpoints adjacent to x and f{x, a) is some southwest-ward direction of motion, 
then r is chosen to ensure that Xair) lies on a segment XsX^^, and U{xa{T)) is approximated by 
a linear interpolation between U{xg) and U{xjj^). Interestingly, in the case of geometric dynamics 



(13), this type of semi-Lagrangian scheme is also equivalent to the usual Eulerian (upwind finite 
difference) discretization. A detailed discussion of this for isotropic problems on grids can be found 
in [41j and for general anisotropic problems on grids and meshes in [371 Appendix] . In addition, both 
types of semi-Lagrangian schemes can be viewed as controlled Markov processes on G (indeed, as 
SSP problems, discussed in section [s]); this earlier approach was pioneered by Kushner and Dupuis 
in [29]; see also a more recent discussion in [35] on applicability of label-setting algorithms. 



We note that (34) is a large coupled system of non-linear equations. In principle, it can be solved 
iteratively |32| . but this approach can be computationally expensive. An attractive alternative is to 
develop a Dijkstra-like non-iterative algorithm. For the fully isotropic case, two such methods are 
Tsitsiklis' Algorithm [41] and Sethian' Fast Marching Method [33]. An overview of many (isotropic) 
extensions of this approach can be found in |35j . Ordered Upwind Methods [36| I37j have similarly 
handled the anisotropic case; a recent efficient modification was introduced in [5]. Similar fast 
methods were introduced for several related PDEs by Falcone and collaborators [T71 [T^ [T31 [Tl] . 

An alternative approach is to speed up the convergence of iterative solver for ( |34[ ) by using 
Gauss-Seidel iterations with an alternating ordering of gridpoints. Such "Fast Sweeping" algorithms 
[TTl HUl Hil l?7] are particularly efficient when the direction of characteristics does not change too 
often. A comparison of various non-iterative and fast-iterative approaches (as well as a discussion 
of more recent hybrid algorithms) can be found in |15j . 

We emphasize that, for the purposes of this paper, any one of the above approaches can be used 



modular ly, whenever we need to solve equation (27). The discretization of (26) is explained in 



detail in subsection |5.3| But before that we address the main computational challenge: the a priori 



Some of the papers cited in this subsection have considered related but shghtly different PDEs, including those 
for finite-horizon and infinite-horizon optimal control, but the fundamental idea remains the same. 
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unknown boundary conditions on F, which resuhs in an imphcit interdependence of gridpoints in 
flu and Qg. 



5.2. Iterative treatment of the budget reset problem. First, we note that the simpler case 



of n = U (i.e., the constrained optimal control problem presented in section 4.2.1) can be solved by 
a single upward sweep in the 5-direction, as described in [55]. This is a direct consequence of the 
explicit caus ality property of the value function when K is strictly positive on fl; see the discussion 
after remark |4.l[ Moreover, without budget resets, relaxing K to be non-ne gati ve (i.e. introducing 



4.2 



a safe subset S where K = 0) still yields semi-implicit causality; see remark 

In contrast, the introduction of budget resets on a safe subset S C ft, breaks this causal struc- 
ture. Consequently, there are no known non-iterative algorithms to numerically solve such problems. 



Therefore, we propose to solve the PDEs (26 1 and (27) (with boundary conditions and the compat- 
ibility condition (29)) by an alternating iterative process. We construct a sequence of functions 
and W2 for fc = 0, 1, 2, . . . , which converge to wi and W2 respectively, as fc — >■ oo. 

We begin with a recursive definition for these new functions on flu and fig. Of course, the 



actual implementation in section [5 .4| relies on their numerical approximations; the resulting method 
is summarized in Algorithm [2] Initially, set 



Wi{x, b) ~ oo 



oo 



{x,b) £ riu: 
xeS. 



Then for fc = 1,2, 



Phase I: Find w'l as the viscosity solution of equation (26 1 with boundary conditions 



'q{x) 



(35) 



wUx,b) 



Imi mf 

X'^X 
X'es 

. oo 



\x') 



xedn,be {0,B] 
xer,be {o,B] 

xeU,b = 0. 



Phase II: Find W2 as the viscosity solution of equation (27) with boundary conditions 



(36) 



W2ix) = { liuiMwHx'.B) 

X'^X 
X'eu 



x(^dVl 
a; e F. 



Remark 5.1. We note that the liminf's in the above definition are primarily for the sake of no- 
tational consistency (since solving (27) on int(5) does not really specify W2^s values on F C dS. 



Alternatively, we can solve the PDE on 5, treating boundary conditions on F "in the viscosity 
sense" [1] . This is essentially the approach used in our numerical implementation; see the discussion 
following formula ( 40 ) . 



Remark 5.2. Intuitively, and can be interpreted as answering the same question as wi and 
W2, but with an additional constraint that no trajectory is allowed to reset the budget (by crossing 
from U to S) more than (fc — 1) times. As a result, for many problems convergence is attained (i.e., 
w'l — wi and W2 = W2) after a finite number of recursive steps. E.g., in the simplest case where 
K and / are constant on fl, JsT > is constant on U, and all connected components of S\dn are 
convex, then any optimal trajectory might enter each connected component at most once. See table 
[2] for the experimental confirmation of this phenomenon. 

5.3. Discretization of w^, wf. Let Wi{x,bj) be an approximation of Wi{x,bj) for all {x,bj) G 
Gu X 'B; similarly, let PFj (a;) be an approximation of wl^x) for all x E Qg. The "natural" boundary 
conditions are implemented as follows: 

(37) W^{x,bj)=q{x), xedgnU,bjeB, 

(38) W^{x)=q{x), xedgns. 
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Additional boundary conditions on F stem from the recursive definition of and (yielding the 



compatibility condition (291 in the limit). In Phase I, we use 



(39) W^{x, bj) = W^-^{x), X e gs,bj e B, 



and then solve the discretized system ( 41 1 on the relevant subset of Qu x B; see the discussion below 
and in section 15.41 

In Phase II, the numerical compatibility condition is enforced on the set of gridpoints 
Gu = {x £ Qu \ X is adjacent to some x' e S}. We set 

(40) W^{x) = W^{x,B), xeGl^, 

and then recover W2 by solving the system of equations equivalent to ( 34 1 on the entire Gs (including 



on n r). As explained in subsection 5.1 this can be accomplished by many different efficient 
numerical algorithms. 

To derive the system of equations defining Wi on Gu x B, we adapt the approach introduced in 



[28j . Property 4.5 is fundamental for the method's efficiency: the characteristic curves emanating 
from dS-jz all move in increasing direction in b. Thus, we only need a single 'upward' sweep in the 
b direction to capture the value function along the characteristics. We exploit this result in the 
semi-Lagrangian framework as follows. For x E G r\U , bj E B, t > 0, write Xair) = x + Tf(x,a) 
and bair) = bj — tK{x, a). If we choose r = Ta^x — {^b)/K{x, a), this ensures that bair) ~ fcj-i, 
and the semi-Lagrangian scheme at {x,bj) becomes 

(41) Wlix, b,) = min {tK[x, a) + W^{xa{T),b^-i)] . 



For each j = 1, 2, . . . , we solve (41 ) for Wi{x, bj), at each x E Gu based on the (already computed) 



Wi values in the bj^i resource-level. 

Remark 5.3. For an arbitrary control value a E A, the point Xair) usually is not a gridpoint; so, 
Wi{xa{T), bj^i) has to be approximated using values at the nearby gridpoints (some of which may 
be in Gs)- Since our approximation of Xair) is first-order accurate, it is also natural to use a first- 
order approximation for its Wi value. Our implementation relies on a bilinear interpolation. E.g., 
for n — 2, suppose the gridpoints Xi,X2,X3,X4 E G are the four corners of the grid cell containing 
Xair), ordered counter-clockwise with Xi on the bottom left corner. If (71,72) — (xair) — Xi)/h, 
then the bilinear interpolation yields 

Wi'ixaiT), 6j_i) = 71 (72W^a;3, 6,-1) + (1 - 72)W''(a;2, 6,-1)) 

+ (1-71) (72W'=(a;4,6j-i) + (1 - 72)W'=(a3i, 6,-1)) , 
where W''{x, b) = W^{x, b) if x E Gu and W'^ix, b) = W^^^ix) if x E Gs- 



Remark 5.4. The direct discretization of equations (26) and ( |27[ ) could be also interpreted as 
defining the value function of the corresponding SSP problem on (Gu x B) U Gs- In this framework, 
the iterative algorithm presented in this section can be viewed as a Gauss-Seidel relaxation of value 
iterations on Gu ^ B alternating with a Dijkstra-like method used on ^^5. 

5.4. Domain restriction techniques. The value function w{x, b) is infinite at points that are 
not reachable within budget b. Since the dimension (n -f- 1) of the domain where Wi is solved 
is typically large, a reduction of the computational domain to its reachable subset usually yields 
substantial saving in computational time. In |28] such a reduction was achieved by an efficient 
preprocessing step, which involved computing the minimum feasible level, i.e. the interface that 
separates the reachable and unreachable parts of ilg. Here we use a similar technique to find the 
"lower" boundary of the reachable subset of Qu - 



Note that, by Property 4.6 for any x ElA, wi^x, bi) < 00 implies wi{x, b) < 00 for all b E [61, B]; 
and ^1(0;, 62) — 00 implies wi{x, b) = 00 for all b E [0, 62]- We formally define the minimum feasible 
level (MFL) in the unsafe set as the graph of 

(42) v{x) — v[wi]{x) = min{& | wi{x, b) < 00}, x eU, 
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and in the safe set 5, as a graph of 



(43) v{x) 



oo a; G S\S-iz. 



The goal is to recover the MFL from some cheaper (lower-dimensional) computation. We note 
that onU, v{x) can be interpreted as the value function of another resource-optimal control problem, 
and as such it can be recovered as the viscosity solution of a similar HJB equation 

(44) min \K{x,a)+Vv{x) ■ f{x,a)} =0, xeU, 



coupled with the boundary conditions ( 43 ) . We note that the algorithm described in Appendix can 
be used to identify S-jz by an iterative process in the n-dimensional space. So, in principle, v{x) 
can be fully computed without increasing the dimensionality of fl. However, to use the MFL as the 



lowest budget" boundary condition for (261, we also need to know the values of Wi on the MFL. 



This corresponds to the "constrained optimal" cost ii along resource-optimal trajectories defined by 



v; see Table 2.1 for a similar example in the discrete setting. The function u is formally defined 



below; here we note that it can be also computed in the process of solving ( 44 1 provided W2 is a priori 
known on dS-ji. This is, indeed, the case when no resets are possible; i.e., U — fl and Stz d S — dfl, 
precisely the setting previously considered in [23 • Unfortunately, for the general case {U ^ ri), we 
do not know of any (fast, lower-dimensional) algorithm to compute W2 on dS-n. (Note that in a 
similar discrete example depicted in Figure [S] the values of the green nodes continue changing until 
the last iteration.) Instead, we proceed to recover the values on the MFL iteratively, using values of 
W2 on dS-ji- 

To describe this iterative process, we first define the fc-th approximation of the reachable subset 
of S: — {x ^ S \ wf (a;) < 00}. Then, MFL'', the fc-th approximation of the MFL, is a graph 



of v^{x) — v[wi]{x), which can be computed by solving p4| with boundary conditions (43) where 
S-jz is replaced by . The numerical approximation V^ix) can be efficiently computed using the 
methods discussed in section [5Tl 

Once is known, we can define the subset of U that is reachable at the fc-th iteration as 
Uj^ = {x &U \ v'^{x) < , and a function w*-' : U!^ R as 



(45) u'^ix) 



W2 ^{x), x £ dU. 



Since we intend to use as a "lower boundary" condition for , we must compute using 
only the information derived from Wj^^, already computed at that stage of the algorithm. For this 
purpose, it is possible to represent as a value function of another related control problem on Ul^. 

Let T = T{x,b,a) = min{t \ y{t) € S!^^}. Define A^{x) to be the set of aU "-y'^-optimal" 
controls; i.e., the controls which lead from x to S!j^^ through using exactly v^{x) in resources. 
For most starting positions x G U^, this set will be a singleton, but if multiple feasible controls are 
available, their corresponding primary costs can be quite different. Then u*^ can be characterized as 



(46) u''{x)^ inf / K{y{t),a{t)) dt + w^-\y{T)). 

a{-)eA''ix) Jo 



By Bellman's optimality principle, (46) yields 



(47) u''(x)= lim min {TK(x,a) + u''(x + Tf(x,a))}, 

r^o+ aeA''{x) 



where A (x) C A is the set of minimizing control values in (44). If U (x) is the approximation to 
u^{x) at a gridpoint x £ Qu^ & natural scmi-Lagrangian scheme based on is 



(48) U (x) = min {TK{x,a) + U''{x + Tf(x,a))}, xeU^, 
where Jj'^ix + Tf{x,a)) is interpolated, and the corresponding boundary condition is 

(49) ly'ix) = W^-\x) X e Gns!^-\ 
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Since the set A''{x) has to be found when solving (44), it is also natural to solve (48) at each 
gridpoint as soon as its V'' becomes available. 

Remark 5.5. As discussed above, U'^ acts as a numerical boundary condition on the surface b = 



V''{x) for the update scheme (41 1. However, in general, V^{x) ^ B. In our implementation, we 
set Wi{x^bj) = U^{x), where j is the smallest integer such that V^{x) < jAb. This introduces 
additional 0(A6) initialization errors at the MFL. An alternative (more accurate) approach would 
require using cut-cells when interpolating near the MFL. 

The resulting iterative method is summarized in Algorithm [2] 



1 Initialization: 

2 W^{x,b) := oo, 

3 wf{x) oo, 

4 W^ix) -qix), 



y xedn; 



5 Compute U{x) for all x E Q; 



6 Main Loop: 

7 foreach k 

8 



1,2,. 



until Wi and W2 stop changing do 



Using W2 ^ to specify Q n , compute V and U for each x E Qu- 



9 
10 

11 

12 

13 

14 

15 
16 

17 

18 

19 

20 
21 
22 
23 
24 



Phase I: 
foreach b 



,B do 



Ab, 2Ab, 
foreach x E Qu do 
if b> V''{x) then 

if 6 < {V''{x)+Ab) then 

Wf^{x,b) := U''{x); 
else 

if W^{x, b-Ab)^ U{x) then 

Wf^{x,b) := U{x); 
else 

Compute Wi{x,b) from equation 

end 
end 
end 
end 



41 



end 



25 Phase II: 

26 Compute W2 on Qs; 

27 end 

Algorithm 2: The budget reset problem algorithm. 



We note that the following properties are easy to prove inductively using the comparison principle 



g] on the PDEs ([26]), ([27]), and (|44]). 

Proposition 5.6. The iterative method is monotone in the following sense: 

(1) w''+^(a;, h) < w^{x, b) for each (x, h) E fi^ and /c 0, 1, 2, 

(2) C 5^+1, for each A: = 1, 2, . . . 
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(3) ti''+^(£c) < v''{x), for each x e and each k ^ 1,2,... 

Remark 5.7. Wc briefly discuss the convergence of W2 and Wi to W2 and wi, respectively, as 
h, A6 — > 0. Under the listed assumptions, the value function W2 is Lipschitz continuous on int(5) and 
can be approximated by the methods described in section [5?T| these approximations will induce errors 
depending on h only, since W2 is approximated only in the top slice h = B. For example, standard 
Eulerian type (first-order upwind finite difference) discretizations are 0{h) accurate, provided the 
solution has no rarefaction-fan-type singularities on the influx part of the boundary (see Remark 



6.2) 



On the other hand, the value function wi can be discontinuous on ^ly and a weaker type of 
convergence is to be expected as a result. In the absence of resets (i.e., with S = dVl), if we 



focus on any compact /C C on which wi is continuous, then the semi-Lagrangian scheme (41) 
has been proven to converge to wi on /C uniformly, provided h = o(A6) as h, Ab — > 0; see [5] 
To the best of our knowledge, there are no corresponding theoretical results for convergence to 



discontinuous viscosity solutions of hybrid control problems (e.g., equations (26)). Nevertheless, 
the numerical evidence strongly supports the convergence of described schemes (section 6.1 1. In 
pSj it was empirically demonstrated that without resets the Li-norm convergence (or Loo-norm 
convergence away from discontinuities) can be often attained even with a less restrictive choice of 
h = 0{Ab). In section [oj wc show that this also holds true even if budget resets are allowed. 

Finally, we note two additional sources of "lower boundary" errors: due to an approximation of 
the MFL and due to an approximation of wi = u on it; the first of these is 0{Ab) while the latter 
is 0{h). 

Remark 5.8. The optimal paths in fir can be extracted from the value function in a manner similar 



to the description in section 4.1.4 The only additional computation is in the 5-direction in for the 
parts of trajectory in flu: for example, in the isotropic case, the budget /3 along the optimal path 
y* decreases by K{y*{t)). For each {x,b), the optimal control value a* can be found either from 



an approximation of ^xU or by solving the local optimization problem similar to (41). Once a* is 
known, the system ( |23[ ) can be integrated forward in time by any ODE solver (our implementation 
uses Euler's method). 

Section 6. Numerical Results 

A simple, illustrative example for budget constrained problems in the discrete setting was already 
considered in detail in section [2j In this section, we present numerical results in the continuous 
setting using the algorithms described in section [5] 

For the sake of simplicity, we will assume that the dynamics is geometric and isotropic (i.e., 
f{x, a) — af{x)), the primary running cost \s K = 1 with the zero exit cost on the target (making 
the value function equal to the total time along the optimal trajectory), and K = 1 (constraining 
the maximum contiguous time spent in lA) in all of the examples. In addition to a numerical 
convergence test of Algorithm [2] (section 6.1), we will illustrate the effects of geometries oiU, S, 



spatial inhomogeneity of the speed / and different maximum budgets B. For each example we 



show the level curves of the value function at the top 6-slice (i.e., w{x,B)). In subsections 6.3 
and |6.4[ we also show constrained-optimal paths, starting at some representative point x £U with 
the maximum starting budget b = B. We emphasize that, for other starting budgets b < B, the 
constrained-optimal paths can be quite different, but all the data needed to recover them is also a 
by-product of Algorithm [2j 

The numerical solutions are computed on f2 = [— 1, 1]^ discretized by a uniform N x N cartesian 



grid. In all examples except for the convergence tests in section 6.1 we use N = 300, h — 2/{N ~ 1), 
and the budget direction is discretized with Ab — i?/round(g^) ; resulting in N}, = \B\ — {B/ Ab) -I- 
1 — 0{l/h). The main loop in Algorithm [2] was terminated when both HW^^'^ — Wf~"'^||oo and 
\\W^ - W^^^W^ feU below the tolerance threshold of 10^^ 

All tests were performed in Matlab with most of the implementation complied into MEX files. 
The tests were conducted on a 2.6 GHz MacBook computer under Mac OS X with 4 GB of RAM. 
On average, the computations took approximately one minute of processing time, but we emphasize 
that our implementation was not optimized for maximum efficiency. 
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6.1. Convergence test. We test the convergence of Algorithm [2] with S — {{x,y) <E Q \ x < 
l/3}Udn. Assume isotropic, unit speed and costs f{x,a) — a, K — K — 1,A — S^, with maximum 
budget B = 1. We consider the case of a point "target" T = (1,0) by choosing the boundary 
conditions 



(50) w{x,y,b) ^ 



{x,y)=T 

+0O {x,y)edrt\{r}. 



Note that the problem is symmetric with respect to the x axis; moreover, an explicit formula for 
the exact solution can be derived from simple geometric considerations. To simplify the notation, 
we define a few useful geometric entities on the domain (see Figure [6] for a diagram): 

C = the vertical line segment at a; = - for — < 2/ < . 

3 3 3 

^^-^^ Pi, P2 = the upper and lower end points of £, respectively. 



Pix,y) = 



Pi, ify>0; 
P2, otherwise. 



We shall only describe an optimal (feasible) path from an arbitrary point x = {x,y) to T, since 
w{x,y,b) is simply the length of that path. For convenience we will use the notation "a; ^ y" as a 
shorthand for "a (directed) straight line segment from x to y" . 

We begin by describing the optimal path from x G S. li x ^ T passes through £, this line 
segment is precisely the optimal path. If the line does not pass through C, the optimal path is 
X — ^ P{x) T ■ Next, we describe the optimal path from x G U with initial budget b. Clearly, if 
X G U is more than b distance from both 2; = | and T, then w{x,b) — +00. Also, if x is within b 
distance from T, the optimal path is a; ^ T. Otherwise, the optimal path will have to first visit 
S (and reset the budget to _B), before moving to T. This situation can be further divided into two 
cases: 

Case 1. if a; G is within 6 distance from £, the optimal path is to move from x ^ y ^ T , where 
y is a point on C such that \\x — y|| <b minimizing \\x — y\\ + \\y — T||. 

Case 2. Otherwise, the optimal path is x ^ z ^ P{x) T, where z is the closest point on the line 
a; = I to P{x) such that ||a3 — z\\ < b. 

From the above, it should be clear that for each 6 > the value function will have a discontinuous 
jump on V{b) = {{x, b) eil^lx eU, \\x - T\\ = 6}. 

We compare the numerically computed solution to the exact solution in the Li and Loo norms. 
For the L^o norm, we compare the solutions on a subset il^ C ile where the w is known to be 
continuous: 

(52) ni = {{x,b) e I ||a;- y|| > e for aU (y, 6) e V{b)}. 

In particular we investigate the Loo norm errors for e — e{h) = 3/i and e = 0.1 (independent of h). 
The Li errors are computed over the whole computational domain. The errors are reported in Table 
[2] A contour plot of the numerical solution on the top 6-slice is shown in Figure [6j 

Remark 6.1. The convergence observed in Table [2] is actually stronger than predicted by theory. 
First, in this numerical test we always chose Ab = h, whereas the theory (even for the no resets case!) 



guarantees convergence only for h = o(A6); see Remark 5.7 Secondly, the Loo-norm convergence is 
guaranteed on any fixed compact set away from discontinuity, but the choice of e — 3h goes beyond 
that. 

Remark 6.2. At the same time, the observed rate of convergence (in all norms) is less than one 
despite our use of the first-order accurate discretization. This is not related to any discontinuities 
in the solution, but is rather due to the "rarefaction fans" (characteristics spreading from a single 
point) present in this problem. Indeed, this phenomenon is well known even for computations of 
distance function from a single target point: a cone- type singularity in the solution results in much 
larger local truncation errors near the target, thus lowering the rate of convergence. A "singularity 
factoring" approach recently introduced in |23j allows to circumvent this issue at the target, but 
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N 




ioo(ii|), e = 3/1 


r ( C\€ \ A 1 

-t'oo(i^l), e = 0.1 


51 


(J.Uolz 


A A/^ O 1 

O.Ooel 


A A£? O 1 

0.0681 


121 


0.0309 


0.0513 


0.0416 


241 


0.0151 


0.0401 


0.0255 


481 


0.0076 


0.0265 


0.0151 


961 


0.0039 


0.0162 


0.0090 



Table 2. Errors to the exact solution for various grid sizes N (note h — 2/{N — 1)). 



we note that there are two more rarefaction fans spreading from points Pi and P2; see Figure |6] 
(Intuitively, this is due to the fact that optimal trajectories from infinitely many starting points pass 
through Pi or P2 on their way to T.) Since in general examples the locations and types of such 
rarefaction fans are a priori unknown, "factoring methods" similar to those in |23| are not applicable. 



s 




, — 

u 




c 






P2' 






Figure 6. Left: an illustration of the domain. The darker region is S and the 
lighter region is U. The target is T — (1,0). The black dotted line is the circle 
of radius B — 1 centered about T va U. Right: a contour plot of the numerical 
solution of ?ii(a;,y, 1) for grid size N = 961. The vertical white dotted line is the 
interface between S and at a; = 1/3. 



6.2. Geometry of S and the number of iterations. We illustrate how the information propa- 
gates within the main loop of Algorithm [2j Since the reachable part of the safe set {Sti) is obtained 
iteratively, it might seem natural to expect that the iterative process stops once all the reachable 
components of S are already found (i.e., once 5^ = S-jz, also ensuring that MFL'' = MFL). Here 
we show that this generally need not be the case and the value function might require more 
iterations to converge after that point. Roughly speaking, this occurs when the order of traversal 
of some optimal path through a sequence of connected components of Stz differs from the order in 
which those components were discovered. Mathematically, the extra iterations are needed because 
the correct values of W2 are not yet known on F, and the values of W^^'"'"*'^ on the MFL are still 
incorrect as a result. 

Consider the following "pathological" example (shown in Figure [?]): S consists of eight square 
blocks 5*1, ^2, . . . , with side lengths 0.4, enumerated counter-clockwise, with T € Si. To simplify 
the discussion, we assume that Sg :— Si and introduce the notation for "unsafe corridors" between 
the safe squares: 

= convex_hull(5,, U S^+i) D U; i = 1, . . . , 8. 

We set 




10 03 e 5 U Ci U C2 U • • • U C7, 
0.1 otherwise. 
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Note that / = 0.1 in Cg. The target isT — (—0.5, —0.5) and the maximum budget is B = 1.5. Note 
that for all the starting positions in S2, ■ ■ ■ ,Ss, the corresponding constrained optimal trajectories 
will run toward Si clockwise (to avoid crossing through the slow region in Cg). The same is true for 
all starting positions in Ci, . . . ,Cj and even for points in Cg that are significantly closer to Ss- 

This problem was specifically designed to illustrate the difference between the evolution of the 
reachable set fi^ = U^US^ and the evolution of the "fully converged" set J^'' = {x \ 'w^{x) = w{x)}, 
on which the value function is correct after the fc-th iteration. Both sets eventually contain all Si's 
and Ci's, but a careful examination of Figure[8| reveals the difference. For the reachable set, fl]^ = Si 
initially, and the algorithm iteratively discovers the reachable Si's (and C^'s) simultaneously in both 
the clockwise and counter-clockwise directions. More than one Si can be "discovered" per iteration 
in each direction: e.g., at iteration fc = 3, Phase I of the algorithm discovers that C2 U C3 C 
owing to a feasible path that "passes through the corner" shared by C2 and C3; this subsequently 
leads to Phase II discovering that 5*3 U ^4 C 5^. The same argument implies that Cg U C7 C U!^ 
and 5*6 U S'7 C S^. After one more iteration we already see that 6*5 C S!^ and another iteration is 
sufficient to recover the entire reachable set $7 7^ = r2|j. 

However, on a large part of the value function is still quite far from correct at that point; 
e.g., a comparison of level curves in C5 and 5*5 shows that C5 ^ . Since T € Si C S and 5*1 is 
convex, we have Si C J"^. It is similarly easy to see that {iF^~^ U Ck-i U Sk) C iF^ for k = 2, 8. 
Thus, it takes eight iterations to correctly compute w on the entire safe set and one more iteration 
to cover those unsafe points (including a part of Cg), whose optimal trajectories take them first to 
5g and then clockwise (through the "fast belt") toward Si. We note that, as iterations progress, 
the value function need not be converged on recently discovered components of S-n, even if the level 
sets already show the generally correct (clockwise) direction of optimal trajectories. For example, it 
might seem that Sg, € J-^ , but a careful comparison of level curves shows that the value function is 
still incorrect even on Cg and 5*7. (This is due to the fact that the reachability of S% is discovered 
by feasible trajectories passing through a common corner of Cg and C7.) Figure [9] confirms this by 
showing the 00-norm of the value changes after each iteration. We observe two key events: 

• the initial drop in value changes, when Q,ti is fully discovered after iteration five 
(at which point the errors are still independent of h and A6); 

• and the convergence (i.e., the drop of value changes to the machine precision) after iteration nine. 



S (shaded) & U (unshaded) Speed function 




Figure 7. Left: The sets S, U and the target T = (—0.5, —0.5) shown by a black 
cross. The eight connected components iSi, S'2, . . . , S'g of 5 are labeled. Right: the 
speed function. Note the slow speed in the corridor Cg between the Si and S'g. 



6.3. Optimal paths and the effect of varying B. We now consider two examples with inhomo- 
geneous speed functions. The first scenario involves a discontinuous speed function that is slow in 
the safe set: 

'1 x&U 
0.3 x(^S. 
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Figure 8. The first six iterations (each after Phase II) of W'^ on b = B. The 
contour lines are scaled logarithmically to avoid bunching in the slow regions of U. 




Figure 9. The max-norm difference between consecutive Wi and . The nu- 
merical convergence tolerance is shown by the thick horizontal dotted line. Both 
and were set to 10^. 



This imposes a curious dilemma: the optimal path tries to avoid S to travel faster to T, but it must 
visit S at least every B distance to keep the budget from depleting. The numerical test for B = 0.4 
is shown in the center plot of Figure |10[ In order to resolve the dilemma, we see that the computed 
optimal path tends to travel along the interface F on the U side while occasionally making short 



visits into S to reset the budget. We have added small white circles to the plot in Figure 10 (center) 
to identify the locations of these "reentry" points. 

The second example illustrates the robustness of the numerical scheme to non-trivial speed func- 
tions in U: we set 

f{x) 1 — 0.5sin(57ra;) sin(57ry). 



The computed value function and a sample path are shown in the right plot of Figure 10 
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Figure 10. Sample optimal paths on the "islands" example with inhomogeneous 
speed functions. 



Next, we give numerical examples showing the effects of varying B. Figure 11 illustrates these 



effects on the "islands" examples (as in figure 10 1 . The speeds were set to / = 1 on all of the domain. 
While the optimal path is computed from the same initial point (0.8, 0.5), note the large changes in 
its global behavior. 




Figure 11. Sample optimal paths on the "islands" example for varying B. Left to 
right: B — 0.3, 0.4 and 0.5, respectively. 



6.4. Constrained contiguous visibility time. We apply Algorithm [2] to a problem involving 
visibility exposure: suppose the objective is to move a robot towards a target in the shortest time, 
among opaque and impenetrable obstacles, while avoiding "prolonged exposure" to a static enemy 
observer. In our budget reset problem setting, we impose the prolonged exposure constraint by 
letting the enemy- visible region be U and the non- visible region as S. This is similar to the problem 
considered in [28], except that once the robot enters the non- visible region, it is again allowed to 
travel through the visible region up to the time B. 

The domain consists of four obstacles which act both as state constraints and as occluders. The 
static observer is placed at (0.8,0.8), and the corresponding visible set is computed by solving an 
auxiliary (static and linear) PDE on f2 [35] • We compute the value function and optimal paths for 



the same starting location but two different exposure budgets: B = 0.15 and B = 0.3, see figure 12 
Note that, for small B, the budget is insufficient to directly travel across the U 'corridor' between 
the "shadows" of the larger foreground obstacles; for the larger B this shortcut is feasible, thus 
reducing the path length. 

Section 7. Conclusions. 



In this paper we focused on computational methods for optimal control of budget-constrained 
problems with resets. In the deterministic case on graphs, we explained how such problems can be 
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S (shaded) & U (unshaded) 




Figure 12. The problem of constrained contiguous visibility time. The static ob- 
server location is shown by an asterisk. Left: the observer-viewable region is white; 
the occluders/obstacles are black, and their "shadows" are gray. The objective of 
is to find the quickest path connecting the two small squares while avoiding pro- 
longed enemy-exposure. Center and right: contour plots of W at b = B and the 
constrained optimal paths with B — 0.15 and 0.3, respectively. 



solved by noniterative (label-setting) methods. We then introduced new fast iterative methods, suit- 
able for both deterministic and stochastic problems on graphs, as well as for continuous deterministic 
budget reset problems. Throughout, we utilized the causal properties of the value function to make 
sure that dynamic programming equations arc solved efficiently on the new extended domain. In the 
appendix we also describe an iterative algorithm on the original (non-extended) domain for solving 
a related simpler problem of finding the budget- reset- "reachable" parts of the state space. 

We presented empirical evidence of convergence and illustrated other properties of our methods 
on several examples, including path-planning under constraints on "prolonged exposure" to an en- 
emy observer. Even though all selected examples are isotropic in cost and dynamics, only minor 
modifications (with no performance penalties) are needed to treat anisotropy in secondary cost K. 
Anisotropics in primary cost and/or dynamics can be also easily treated by switching to a different 
(non-iterative or fast iterative) method on the safe set S. Several other natural extensions are likely 
to be more computationally expensive, but can be handled in the same framework. First, in some 
applications the resource restoration is more realistically modeled not as an instantaneous reset, but 
as a continuous process on S. Alternatively, resets might be also modeled as conscious/optional 
control decisions available on S, with an instantaneous penalty in primary cost. Second, differential 
games can be similarly modified to account for limited/renewable resource budgets. Finally, both 
the dynamics and the budget changes might be affected by some random events, leading to stochas- 
tic trajectories in the extended domain. All of these extensions are of obvious practical importance 
in realistic applications, and we hope to address them in the future. 



Appendix A. Determining the Reachable Set without expanding the state space. 



For certain applications, one may be interested in computing only the reachable sets: 

= {{x, b) I w{x, fe) < oo} C 

While ^Ifi is indeed a byproduct of Algorithm [2] the goal of this section is to recover ^l-ji by a "lower 
dimensional" algorithm; e.g., using only computations on a grid ^ in 17 C i?". 

We note that the reachable set can be alternatively defined as = {{x,b) \ v{x) < b}, where 



v{x) is the minimum needed starting budget, defined in (42) and (43|. Thus, it suffices to compute 
V from (44 1, which in turn requires knowing S-fi — {x d S \ W2{x) < cxd} to impose the boundary 
condition ( 43 ) . We take an approach similar to Algorithm pj by iteratively growing the known 
reachable set. The main difference is, instead of computing wfon flu, we solve a simpler boundary 
value problem on U itself - here we only need to know whether T = {x E dft \ q(x) < c»} is 
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reachable (from each point in lA and starting with a given budget), instead of finding the minimum 
cost of reaching it. 

To this end, we introduce the auxihary functions g'^ : f2 — > ii, computed iteratively in tandem 
with w'^', that can be used to extract the reachable sets in the safe set: Sij^ — {x & S \ g'^{x) < oo}. 



Similar to the recursive algorithm in section 5.2 we define initially 

'o xeT, 



(53) v'{x)=g°ix) = 



oo otherwise; 



the k-th auxiliary function is defined as the (discontinuous) viscosity solution of the boundary value 
problem 

||V/(a;)|| =0 xe int(5), 

'0 xeT 

^ I liminfw'=(a;) a; e r\r 



X'^X 
X'eu 



^oo otherwise. 



(See Remark 5.1 regarding the 'lim inf ' in the boundary condition above.) The MFL*^ function v'^ 



defined as described in section 



5.4 



except that ^ is no longer computed using Wi; instead, it is 
computed from g^~^ via the formula S!^^ = {x S \ g''^^{x) < oo}. Note that (54) implies that 
for each (closed) connected component S' of 5, 

g''(x)= min g''(x'), for all a; e 5', fc = 0, 1, 2 . . . 

^ ^ x'edS' ^ ' 

Once the iterative procedure reaches a steady state, the approximation to v can be used to extract 
(an approximation to) the reachable set il-ji. An important observation is that S!^ represents all 
point in S that can be reached from T by a feasible path containing at most k contiguous segments 
through S\T. Thus, under the assumption that the number of connected components of S is finite, 
the algorithm will converge in a finite number steps. 

The numerical approximations V'', of , g'^, respectively, can be solved on Q using standard 
numerical methods described in section [5. 1| The iterative method is outlined in Algorithm [3] 



1 Initialization: 

2 V'^ix) = G°{x) 



Va; e dg, 

oo yxeg\dg; 



3 Main Loop: 

4 foreach fc = 1, 2, . . . until and stop changing do 

5 Compute y , using G^-^ to specify 5^" ^ 



6 Compute G^ from equation 54 using V'^ on gu; 

7 end 

Algorithm 3: Reachability algorithm for the budget reset problem. 
As an illustration, we implemented Algorithm [3] and applied it to the same problem as in the 



left plot of Figure 11 (case B = 0.3). The first four iterations are shown in Figure 13 The domain 
[—1,1]^ was discretized on a 400 x 400 grid. The algorithm halted after five iterations and took 0.45 
seconds. 
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Figure 13. First four iterations for of Algorithm |3] The blue contours depict the 
level set {V''{x) = B = 0.3} and the shaded region is the set S!j^ ~ {G''{x) < oo} 
at the fc-th iteration. The target is shown by a '+' at T = (—0.7, —0.4). 
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